The Conspiracy of Random Predictors and Model Violations against Classical Inference in Regression
نویسنده
چکیده
We review the early insights of Halbert White who over thirty years ago inaugurated a form of statistical inference for regression models that is asymptotically correct even under “model misspecification.” This form of inference, which is pervasive in econometrics, relies on the “sandwich estimator” or “heteroskedasticity-consistent estimator” of standard error. Whereas common practice in statistics assumes models to be correct and inference to be conditional on the predictors, White permits models to be “incorrect” and predictors to be random. Careful reading of his theory shows that it is in fact a synergistic effect — a “conspiracy” — of model misspecification and randomness of the predictors that has the deepest consequences for statistical inference. In this review we limit ourselves to linear least squares regression as the demonstration object, but the qualitative insights hold for all forms of regression. We will see that the term “heteroskedasticity-consistent estimator” is misleading because nonlinearity is a more consequential form of model deviation than heteroskedasticity, and both forms are handled asymptotically correctly by the sandwich estimator. The same analysis shows that a valid alternative to the sandwich estimator is provided by the “pairs bootstrap.” We continue with a novel asymptotic comparison of the sandwich estimator and the standard error estimator from classical linear models theory. The comparison shows that when standard errors from linear models theory deviate from their sandwich analogs, they are usually too liberal, but occasionally they can be too conservative as well. We conclude by answering questions that would occur to statisticians acculturated to the assumption of model correctness and conditionality on the predictors: (1) Why should we be interested in inference for models that are not correct? (2) What are the arguments for conditioning on predictors, and why might they not be valid?
منابع مشابه
A Conspiracy of Random X and Model Violation against Classical Inference in Linear Regression
Following the econometric literature on model misspecification, we examine statistical inference for linear regression coefficients βj when the predictors are random and the linear model assumptions of first and/or second order are violated: E[Y |X1, ..., Xp] is not linear in the predictors and/or V [Y |X1, ..., Xp] is not constant. Such inference is meaningful if the linear model is seen as a ...
متن کاملModels as Approximations — A Conspiracy of Random Predictors and Model Violations Against Classical Inference in Regression
Abstract. We review and interpret the early insights of Halbert White who over thirty years ago inaugurated a form of statistical inference for regression models that is asymptotically correct even under “model misspecification,” that is, under the assumption that models are approximations rather than generative truths. This form of inference, which is pervasive in econometrics, relies on the “...
متن کاملModels as Approximations — A Conspiracy of Random Regressors and Model Misspecification Against Classical Inference in Regression
Abstract. More than thirty years ago Halbert White inaugurated a “modelrobust” form of statistical inference based on the “sandwich estimator” of standard error. This estimator is known to be “heteroskedasticityconsistent”, but it is less well-known to be “nonlinearity-consistent” as well. Nonlinearity raises fundamental issues because regressors are no longer ancillary, hence can’t be treated ...
متن کاملBayesian Inference for Spatial Beta Generalized Linear Mixed Models
In some applications, the response variable assumes values in the unit interval. The standard linear regression model is not appropriate for modelling this type of data because the normality assumption is not met. Alternatively, the beta regression model has been introduced to analyze such observations. A beta distribution represents a flexible density family on (0, 1) interval that covers symm...
متن کاملEvaluation of risk factors of recurrence of hodgkin\'s lymphoma using random survival forest and comparison with cox regression model
Background: In many studies, Cox regression was used to assess the important factors that affect the survival of cancer patients based on demographic and clinical variables. The aim of this study was to determine the factors affecting the survival of patients with Hodgkin's lymphoma using the random survival forest (RSF) method and compare it with the Cox model. Methods: In this retrospective ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013